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ABSTRACT. In {BCGMOIX we have generalized the Knuth-Morris-Pratt (KMP) pattern matching algorithm and defined a non- 
conventional kind of RAM, the MP-RAMs which model more closely the microprocessor operations, and designed an 0{n) 
on-line algorithm for solving the serial episode matching problem on MP-RAMs when there is only one single episode. We 
here give two extensions of this algorithm to the case when we search for several patterns simultaneously and compare them. 
More preciseley, given g + 1 strings (a text t of length n and q patterns mi, . . . , rriq) and a natural number w, the multiple 
serial episode matching problem consists in finding the number of size w windows of text t which contain patterns mi, . . . , m, 
as subsequences, i.e. for each mi, ifnii = pi, . . . ,pk, the letters pi, . . . ,pk occur in the window, in the same order as in rrii, 
but not necessarily consecutively ( they may be interleaved with other letters). 
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1. Introduction 



The recent development of datamining induced the development of computing techniques, among them is 
episode searching and counting. An example of frequent serial episode search is as follows: let i be a text consisting 
of requests to a university webserver ; assume we wish to count how many times, within at most 10 time units, 
the sequence 61626364 appears, where 61 = 'Computer Science', 62 — 'Master', 63 = 'CS318 homepage', 64 — 
'Assignment'. It suffices to count the number of 10-windows of t containing the subsequence p — 61626364. If 
ei , 62 , 63 , 64 must appear in that same order in the window, the episode is said to be serial, if they can appear in any 
order, the episode is said to be parallel; a partial order can also be imposed on the events composing an episode (see 
|MTV95|, which proposes several algorithms for episode searching). Searching serial episodes is more complex 
than searching parallel episodes. Of course, if one has to scan a log file, it is better to do it for several episodes 
6162 . . . e„, /1/2 . . . /,„, gi.g2 ■ • ■ <?p simultaneously. We will hence investigate the search of several serial episodes 
in the same window: each serial episode is ordered, but no order is imposed among occurrences of the episodes in 
the window. 

The problem we address is the following: given a text t of length n, patterns mi , . . . , ruq on the same alphabet 
A and an integer w, we wish to determine the number of size w windows of text containing all q patterns as 
serial episodes, i.e. the letters of each rrii appear in the window, in the same order as in nii, but they need not 
be consecutive because other letters can be interleaved. When searching for a single pattern m, this problem with 
arguments the window size w, the text t and pattern m is called serial episode matching problem in I MTV95J . 
episode matching in IDFGGK97I and subsequence matching in IAHU74I : a related problem is the matching with 
don 't cares of IMBY91IIKR971 . 

This problem is an interesting generalisation of pattern-matching. Without the window size restriction, it is 
easy to find in linear time whether p occurs in the text: if p = pi . . .pk, a finite state automaton with fc + 1 states 
So, si, . . . , Sfe will read the text; the initial state is sq; after reading letter pi we go to state si, then after reading 
letter p2 we go to state S2, . . . ; the text is accepted as soon as state Sk is reached. Episode matching within a 
w-window is harder; its importance is due to potential applications to datamining I.M97. ,MTV95J and molecular 
biologvlMBY9T1 lKR97llNfe02l . 

For the problem with a single episode in w-windows, a standard algorithm is described in IDFGGK97IIMTV951 . 

It is close to the algorithms of pattern-matching rA90 AHU74| and its time complexity is 0{nk). Another on-line 
algorithm is described in [DFGGK97J: the idea is to slice the pattern in k/ log k well-chosen pieces organised in a 
trie; its time complexity is 0(nfc/ log k). We gave an on-line algorithm reading the text t, each text symbol being 
read only once and whose time complexity is 0{n) IBCGMOTl . 

In this paper, we describe two efficient algorithms (Section|3} for solving the problems of simultaneous search 
of multiple episodes. These algorithms use the MP-RAM, that we introduced in LBCGMOll . to model micropro- 
cessor basic operations, using only the fast operations on bits (shifts), and bit-wise addition; this gives an on-line 
algorithm in time 0{nq) (theorem^. In practice, this algorithm based on MP-RAMs and a new implementation 
of tries, is much faster as shown in section |4] We believe that other algorithms can be considerably improved if 
programmed on MP-RAMs. 

Our algorithm relies upon two ideas: 1) preprocess patterns and window size to obtain a finite automaton 
solving the problem as in Knuth, Morris, and Pratt algorithm |KMP771 (the solutions preprocessing the text r T02l 
IMBY91 S71 U95 I are prohibitive here because of their space complexity) and 2) code the states of this automaton 
to compute its transitions very quickly on MP-RAMs, without precomputing, nor storing the automaton: using the 
automaton itself is also prohibitive, not the least because of the number of states; we emulate the behaviour of 
the automaton without computing the automaton. We study: (a) the case when the patterns have no common part 
and (b) the case when they have similar parts. In each case, an appropriate preprocessing of the set of patterns 
enables us to build an automaton solving the problem and we show that the behaviour of this automaton can be 
emulated on-line on MP-RAMs. Moreover, the time complexity of the preprocessing is insignificant because it 
is smaller than the text size by several orders of magnitude: typically, window and patterns will consist of a few 
dozen characters while the text will consist of several million characters. 

The paper is organised as follows: in section 2, we define the problem, in section|3lwe describe the algorithms 
searching multiple episodes in parallel; we present the experimental results in section|4l 



Multiple serial episodes matching 3 



Figure 1 : A French advertisement 



Figure 2; A text with two 5-windows containing "vie" (in gray), and a single 5-window containing "vile". 
2. The problem 

2.1. The (multiple) episode problem 

An alphabet is a finite non-empty set A. A length n word on ^ is a mapping t from {1, . . . , n} to A. The only 
length zero word is the em/?(y wort/, denoted by e. A non-empty word t : i^ti is denoted by iit2 • • • in. A 
language on alphabet A is a set of words on A. 

Let t = tit2 ■ ■ ■ tn he a word which will be called the text in the paper. The word p = P1P2 ■ ■ ■ Pfc is a factor 
of t iff, there exists an integer j such that tj^i = pi for 1 < i < fc. A size w window of on t, in short lu-window, 
is a size w factor • • • ii+u, of t; there are n ~ w + 1 such windows in t. The word p is an episode (or 

subsequence) of i iff there exist integers 1 < ii < i2 < ■ ■ ■ < ik ^ n such that = pj for 1 < j < fc. If 
moreover, ik ~ ii < w, p is an episode of i /n a ui-w/nc/ow. 

Example 1 If i = "dans ville il y a vie" (a French advertisement, see figure^ then "vie" is a factor and hence a 
subsequence of t. "vile" is neither a factor, nor a subsequence of t in a 4-window, but it is a subsequence of t in a 
5-window. See figure|2n 

Given an alphabet A, and words t, mi, . . . , niq on A: 

- the simultaneous pattern-matching problem consists in finding whether nii , . . . , are factors of t, 

- given moreover a window size w: 

- the subsequence existence problem consists in finding whether nii, . . . , niq are subsequences of < in a 
w-window; 

- the multiple episode search problem consists in counting the number of ui-windows in which all of 
mi, . . . , niq are subsequences of t. 

For the simultaneous search of several subsequences mi, . . . , niq, we have various different problems: 

- either we count the number of occurrences of each nii in a w-window (not necessarily the same): this case 
will be useful for searching in parallel, with a single scan of the text, a set of patterns which are candidates for 
being frequent. 

- or we count the number of windows containing all the m^s: this case will be useful for trying to verify 
association rules. For example, the association rule m2 , . . . , mq =4> mi will be useful if the number of w- windows 
containing all the TO2 , . . . , m, is high enough, and to check that, we will count the w-windows containing all of 
m2, . . . , niq. Our method will enable us to verify more easily both the validity of the association rule ("among 
the windows containing TO2, . . . , m^ many contain also mi") and the fact that it is interesting enough ("many 
windows contain m2, . . . , m^"): it will suffice to count simultaneously the windows containing m2, . . . , nig and 
the windows containing mi, TO2, ■ ■ • , n^q- 

A naive solution exists for pattern-matching. Its time complexity on RAM is 0{nk), where k is the pattern size. 
Knuth, MoiTis, and Pratt (KMP77| gave a well-known algorithm solving the problem in linear time 0{n + k). A 
solution in 0{nk) is given in |MTV95| for searching a single size k episode. We gave in IBCGMOll an algorithm 
with time complexity 0(n) (on MP-RAM) for searching a single episode. 
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2.2. The notation o{nk) 

Let us first make precise the meaning of the notation o{nk). 

The notation o{h{n)) was introduced to compare growth rates of functions with one argument; for comparing 
functions with several arguments, various non-equivalent interpretations o{h{n, m, ...)) are possible. Consider a 
function t{n, k); t{n, k) — o{nk) could mean: 

1) either lim t(n, k)/nk = 0; 

n+k — ' + 00 

2) or lim t(n,k)/nk = 0,i.e.\/e,3N, \/n,\/k[n > N andk > N =^ t{n,k) < enk). 

fc-. + CXJ 

With meaning 1, no algorithm can solve the single episode within a window problem in time o{nk). Indeed, 
any algorithm for the episode within a window problem must scan the text at least once, hence t{n, k) > n. For a 
given k, for example k = 2, we have t{n, k)/nk > 1/2. Hence lim t{n, k)/nk = is impossible. We thus 

have to choose meaning 2. 



2.3. Algorithms on MP-RAM 

Given a window size w and q patterns, we preprocess (patterns + window size w) to build a virtual finite 
state automaton A; we will then emulate on-line the behaviour of A to scan text t and count in time nq the 
number of windows containing our patterns as episodes. Note that our method is different from both: 1) methods 
preprocessing the text IT021 1MB Y9 1 1 ISTTl IU95I (we preprocess the pattern) and 2) methods using suffixes of the 
pattern >C88"MB Y9TllKR97l|U95 1 (we use prefixes of the patterns). We encode the subset of states of A needed to 
compute the transitions on-line on an MP-RAM. Indeed, A has 0(w + 1)''' state, where k is the size of the structure 
encoding the q patterns mi , . . . , m^; for w and q large, the time and space complexity for computing the states of 
A becomes prohibitive, whence the need to compute the states on-line quickly without having to precompute nor 
store them. We introduced MP-RAMs to this end. 

Pattern-matching algorithms are often given on RAMs. This model is not good when there are too many 
different values to be stored, for example 0{w + 1 ) states for A. As early as 1 974, the motivation of | PRS74 1 for 
introducing "vector machines" was the remark that boolean bit-wise operations and shifts which are implemented 
on computers are faster and better suited for many problems. This work was the starting point of a series of 
papers: ITRL92I IBG95I comparing the complexities of computations on various models of machines allowing 
for boolean bit-wise operations and shifts with computation complexities on classical machines, such as Turing 
machines, RAMs etc. The practical applications of this technique to various pattern-matching problems start with 
IB YG92IIWM92I : they are known as bit-parallelism, or shift-OR techniques. We follow this track with the episode 
search problem, close to the problems studied in IBYG92IIWM92irBYN96l . albeit different from these problems. 

In the sequel, we use a variant of RAMs, which is a more realistic computation model in some aspects, and 
we encode A to ensure that (i) each state of A is stored in a single memory cell and (ii) only the most basic 
microprocessor operations are used to compute the transitions of A. Our RAMs have the same control structures 
as classical RAMs', but the operations are enriched by allowing for boolean bit- wise operations and shifts, which 
we will preferably use whenever possible. Such RAMs are close to microprocessors, this is why we called them 
MP-RAMs. 

Definition 1 An MP-RAM is a RAM extended by allowing new operations: 

1 ) the bit-wise and, denoted by &, 

2) the left shift, denoted by <C or shl, and 

3) the right shift, denoted by >• or shr. 

The new operations are low-level operations, executable much faster than the more complex MULT, DIV oper- 
ations. 



1. See IAHU74I pages 5-11, for a definition of classical RAMs. 
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Figure 3: Trie representing tu, tue and tutu. The full black circles indicate ends of patterns. 



Example 2 Assume our MP-RAMs have unbounded memory cells. We will have for example: (10110 & 01101) = 
100, (10110 <C 4) = 101100000 and (10110 >■ 3) = 10. If memory cells have at most 8 bits, we will have: 
(10110 <C 4) = 1100000, that will be written as (00010110 <C 4) = 01100000. 

3. Parallel search of several patterns 

Let us recall the problems. Given patterns nii , m2, . . . , niq, we can: 

- either count the number of occurrences of each rrii in a w-window (not necessarily the same one); 

- or count the number of w-windows containing mi, TO2, . . . , m^. 

The algorithm we described in IBCGMOll for counting the number of w-windows containing a single pattern 
m can be adapted to all these cases, only the acceptance or counting condition will change. 

To search simultaneously several patterns nii, . . . , ruq, IWM92I propose a method concatenating all the pat- 
terns. To search simultaneously several episodes toi, . . . , rUq, we generalise our algorithm |BCGM01 1: we use q 
counters ci , . . . , Cq initially set to 0, and we define an appropriate multiple counting condition such that each time 
rui is in a w-window, the corresponding counter q is incremented. This method has a drawback: if the patterns are 
too long, it will need more than one memory cell for coding the states of the automaton. For searching multiple 
patterns the method proposed by |DFGGK97| to optimise the search, when words mi, . . . , rUq have common pre- 
fixes, is to organise mi, ... , rUq in a trie \ K97 1 before applying the standard algorithm. We apply our algorithm on 
MP-RAMs in a similar way, and implement tries in a new way. We thus can encode the set of patterns compactly, 
and then encode the states of the automaton on a single memory cell. 

3.1. Representing patterns by a trie 

Consider for example episodes toi — tu, TO2 = tue, and — tutu. We choose this example because it 
illustrates most of the difficulties in encoding the automaton: episode taie is very simple because all letters are 
different, tati is less simple because there are two occurrences of t which must be distinguished, tutu a bit more 
complex (the first occurrence of tu must be distinguished from the second one), turlututu would be even more 
complex. We represent these three episodes by the trie t pictured in figure|3| 

We implement this trie t by the three tables below: 



t 


u 


e 


t 


u 


pr = 





1 


2 


2 


4 


/ = 


2 


3 


5 



Table tr represents the "flattened" trie. Predecessors are in table pr: pr[i] gives the index in tr of the parent of 
tr[i] in the trie; means there is no predecessor and hence it is a pattern start^. Finally / marks patterns ends: f[i] 
is the index in tr of the end of pattern i. 

3.2. Preprocessing the trie and algorithm 

We preprocess the trie of patterns and this gives us a finite state automaton A. Its alphabet is A. The states are 
the fc-tuples of integers {li, . . . , Ik) with Ij belonging to {1, . . . , w, +oo}, where k is the size of table tr and w the 
window size. 



2. Numbering of indices starts at 1 in order to indicate pattern starts by 0. 
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binary expansion of L 



0: 



binary expansion of Ik binary expansion of I2 binary expansion of h 



Figure 4: Encoding of {h, ■ ■ ■ ,lk)- 















L = 


0:h 


0:h 


0:h 




0:li 



Figure 5: Encoding of (/i, . . . , ^5); k is the binary expansion of k. 



We describe informally the behaviour of A. A scans t, it will be in state {li, . . . ,lk) after scanning ti . . .tm iff 
li is the length of the shortest suffix-' of ii . . . t„i shorter than w and containing tr[ji] . . . tr[i] as subsequence for 
i = 1, . . . , k, where tr[ji] . . . tr[i] is the sequence of letters labelling the path going from the root of the trie to the 
node represented by tr[i] . If no suffix (of length less than w) of ti . . . tm contains tr[ji] . . . tr[i] as a subsequence, 
we let li = +00. 

Let us now describe our algorithm. Let be the least integer such that w + 2 < 2^. The role of +00 is played 
by 2^ — 1, whose binary encoding is a sequence of ft ones. We define the function Nexto by: 

^^"*"(') = l2^^1, else. 
State {li, . . . , Ik) is encoded by integer: 

k k 

L^J2 hi2''+'y-' - E 0» + - 1))) ■ 



Let li denote the binary expansion of li, i — 1, k, prefixed by zeros in such a way that li occupies 57 bits (all 
ks are smaller than 2^^ — 1, hence they will fit in D, bits). The binary expansion of L is obtained by concatenating 
the liS, each prefixed by a zero (figure |4}. These initial zeros are needed for implementing function Nextfj to 
indicate overflows. Every integer smaller than 2'^^^+^) can be written as k big blocks of (il + 1) bits, the first bit 
of each big block is (and is called the overflow bit) and the fi remaining bits constitute a small block. The blocks 
are numbered 1 to fc from right to left (the rightmost block is block 1, the leftmost block is block fc). 

By the definition in equation Q, the initial state (+00, . . . , +00) is encoded by: 

k k 
i=l i=l 



One might see a multiplication here. In fact we will need a loop for « = 1 to fc. We will execute each time we 
go through the loop a shift of + 1, and the multiplication will disappear. All equations below are treated in the 
same way. 

Assume that the window size is w = 13 hence = 4. With the notations of figure|5j state I = (2, 5, 00, 5, 00) 
is encoded by: 















L = 


0: 15 


0:5 


0: 15 


0:5 


0:2 



The initial state is represented by: 



Iq = I 0: 1111 I 0:1111 I 0:1111 | 0: 1111 | 0: 1111 
3. Word s is a prefix (resp. suffix) of word t iff there exists a word v such that t = sv (resp. t = vs). 
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or, writing 1 instead of the fl ones representing oo: 



lo 



0:1 0:1 0:1 0:1 0:1 



In transition I = (h, . . . ,lk) -—^ I' = (I'l, ■ ■ ■ , I'k)' the l'^ component of the new state /' is either Nextn(/pr[i]) 
or Nextsi(/i) according to whether the scanned letter a is equal to tr[i] or not. The cases /' = A^ea;io(^pr[i]) ™d 
l'^ — Nextfi{li) respectively yield sl first type computation and a second type computation. 



To generalise the algorithm of IBCGMOll . we must define several masks M^, for each letter a of alphabet A. 
If a has several occurrences in table tr, we will need as many masks as occurrences tr\i\ and tr[i'] of a with 
j = i — pr[i\ 7^ i' — pr[i'] = j' (a single mask will suffice for the set of all occurrences such that i — pr[i] has 
the same value j, because they correspond to the same shift of j big blocks). The are the masks preparing 
first type computations. Precisely, if tr[i] — a and i — pr[i\ = j, the operation (i <C ^(51 + l))&AfJ^ will shift 
everything of j big blocks leftwards and will erase the blocks for which a ^ pi ox i — pr[i\ ^ j. For i > \, the 
i-th block will thus contain lpr[i\ iff tr[i\ = a and i — pr[i] = j. It will contain otherwise. 



In our example (mi 
suffice: 



tu, 1712 = tue, and ma — tutu), we will need two masks Mt but a single mask A/„ will 



Ml - 



0:0 



0:0 



0:0 



0:0 



0:1 



^ 0:0 0:1 0:0 0:0 0:0 



M}= 0:1 0:0 0:0 0:1 0:0 



M} = 



0:0 



0:0 



0:1 



0:0 



0:0 



where = GOOD and 1 = 1111. 

Mask is the complement of J2j ^'H^ preparing second type computations. The operation LhN„ will erase 
the blocks for which a = tr[i]. For our example, we have: 



Nt 



0:1 



0:0 



0:1 



0:1 



0:0 



0:0 



0:1 



0:1 



0:0 



0:1 



0:1 



0:1 



0:0 



0:1 



0:1 



Generally, if k is table tr size. 

Mi- E 



tr\i]=(7 and prli 



and 



(((l«f})-l) « ((fi + l)(z-l))). 



l<i<fe 



l<i<k 



Ncr is the complement of J^j 

Transition I — (Zi, . . . ,lk) I' — {I'l, ■ ■ ■ , I'k) is computed by: 



where: 
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El 



0;0001 


0:0001 


0:0001 


0:0001 


: 0001 



Adding Ex amounts to add 1 to each small block. 

In our example, if we scan letter t, the transition is computed by: 

T={{L<: 2{n + l))kMf) + {{L <c (f7 + l))kMl) + {LkNt) + Ei 

yielding for I = (2, 5, oo, 5, oo), encoded by: 















L = 


0: 15 


0:5 


0: 15 


0:5 


0:2 



the result: 



1:0 


0:6 


1:0 


0:6 


0:1 



All the blocks contain the correct result, except for the leftmost block and the middle block where an overflow 
occurred. To treat blocks where overflow occurred it suffices of initiahse again these blocks by replacing T with 

L' = T - {{TkE2) :> n), where: 



Eo 



1:0 1:0 1:0 1:0 1:0 



We find: 



Hence: 



and finally; 



TkE2 = 1:0 0:0 1:0 0:0 0:0 



{TkE2) >• n 



0:1 0:0 0:1 0:0 0:0 













0: 15 


0:6 


0: 15 


0:6 


0:1 



Last we define a counter Ci for each pattern m^, and increment it whenever < w + 1, which is implanted 

by: MikL < (w + l)2(*^^+i)(^H-i), for i = 1, . . . , /c, where A/, = ((1 <; r2) - l) <c {{VL + l){f[i] - 1)). 

Our algorithm treats the more complex case where we demand that all episodes appear in a same window, a 
case that cannot be treated by the separate counting of the number of windows containing each episode. A simple 
modification of the counting condition enables us to also count with a single scan of the text the number of windows 
containing each individual episode, in a more efficient way than if the text were to be scanned for each episode. 



Theorem 1 There exists an on-line algorithm in time 0{nq) solving the parallel search of q serial episodes in a 
size n text (assuming the episode alphabet has at most y/n/q letters) on MP-RAM. 



Proof: Let a be the number of letters of the alphabet. As in IIDFGGK97I . we treat in the same way all letters not 
occurring in the patterns; this leads to defining two masks Mother and Mother common to all such letters. Let |?z;| 
be the length of the binary expansion of w. The algorithm consists of four steps: 

1) compute (at most) q x (fc + 1) integers representing the masks M^, (fc + 1) integers representing the masks 
No- and the integers il, A, Iq, F, Ei, E2', all these integers are of size k{\w\ + 2) and are computed simultaneously 
in k iterations at most. The integer k is the size of the trie representing the patterns: k < — V^- 

2) let c = (c is the number of w- windows containing all the patterns). 
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Figure 6; The continuous thin lines represent the execution time of the MP-RAM algorithm (with trie); the dotted line 
represents the execution time of the MP-RAM algorithm (with concatenation); the dashed lines the execution time of the 
standard algorithm (with concatenation) and the continuous thick lines the execution time of the standard algorithm (with trie). 

3) let L = Iq. 

4) scan text t; after scanning i^, compute the new state L (on-line and without preprocessing with an MP-RAM) 
and if < w for i = 1, . . . ,q, increment c by 1 . 

Our algorithm uses only the simple and fast operations &, together with a careful implementation of <C, ;> and 
addition. Step 1 ofpreprocessingis intime gfc(A;+l)+g(A;+l)+log(K;) < q{y/n)'^ + 2qy/n+q+log{w) = 0{nq); 
in general, k, q and w are smaller than n by several orders of magnitude and we will have: qk{k + 1) + q{k + 1) + 
\og{w) = o{n). In step 4 we scan text t linearly in time 0{n) and perform q comparisons (one for each counter 
Ci). Complexity is thus in time nq, hence finally a time complexity 0{nq) for the algorithm. □ 

4. Experimental results 

The algorithm on MP-RAM has a better complexity than the standard algorithm, however, the underlying 
computation models being different, we checked experimentally that the MP-RAM algorithm is faster. We imple- 
mented all algorithms in C++. Experiments were realised on a PC (256 Mo, IGhz) with Linux. The text was a 
randomly generated file. We measured the time with machine clock ticks. 

For searching multiple patterns, we took 3 to 5 patterns of length 2 to 4; in figure |6] case (a) is the case of 
patterns having no common prefix, and case (b) is the case of patterns having common prefixes. In case (a), the 
MP-RAM algorithm where we concatenate the patterns is at least twice as fast as the standard "naive" algorithm 
where patterns are concatenated; both standard algorithms (with patterns concatenated or organised in a trie) are 
equivalent, the algorithm with concatenation being slightly faster; this was predictable since a trie organisation 
will not give a significant advantage in that case; the MP-RAM algorithm where the patterns are organised in a 
trie is 30 to 50% faster than the standard algorithm with trie, and 10 to 15% slower than the MP-RAM algorithm 
where the patterns are concatenated. However, as soon as the total length of the patterns is larger than 7 or 8, or the 
window size is larger than 30, if patterns are concatenated, the automaton state can no longer be encoded in a single 
32 bits memory cell, and it is better to use the MP-RAM algorithm with trie (figure|S]case (b)). Figure|S]case (b) 
shows that, for patterns having common prefixes, the MP-RAM algorithm with trie is 1.3 to 1.5 times faster than 
the standard algorithm with trie, itself 1.4 to 1.6 times faster than the standard algorithm with concatenation. 

5. Conclusion 

We presented new algorithms for multiple episode search, much more efficient than the standard algorithms. 
This was confirmed by our experimental analysis. Note that with our method, counting the number of windows 
containing several episodes is not harder than checking the existence of one window containing these episodes. 
This is not true with most other problems; usually counting problems are much harder than the corresponding 
existence problems: for example, for the "matching with don't cares" problem, the existence problem is in linear 
time while the counting problem is in polynomial time | KR97 1 and in the particular case of | MBY91J , the existence 
problem is in logarithmic time while the counting problem is in sub-linear time. 
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